-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntactic indexing: enqueuer and scheduler #62485
Conversation
11d5272
to
03414ce
Compare
It initialises a new database which of course doesn't contain any of the test data
Caution License checking failed, please read: how to deal with third parties licensing. |
Caution License checking failed, please read: how to deal with third parties licensing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some initial comments; will take a deeper look at more of the code shortly.
internal/codeintel/syntactic_indexing/internal/policy_iterator.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions
- (Strong) Let's avoid doing the repo check again inside the doubly nested loop.
- (Weak) Let's avoid doing the commit check multiple times
- (Weak) Let's avoid revisiting the same commit multiple times per repo
- (Weak) Let's simplify the env vars. We can make it more complicated later.
- (Strong) Please clarify the policies array elements in
scheduler_test.go
Would prefer Strong suggestions to be addressed pre-merge.
Thanks for your patience. It's been a bit hard for me to wrap my head around all the logic here, so this took longer to review than anticipated.
Additionally, harden the types used for repository and commit values
commitsToSchedule := make(map[api.RepoID]collections.Set[api.CommitID]) | ||
enqueueOptions := EnqueueOptions{force: false} | ||
|
||
var allErrors errors.MultiError | ||
|
||
for _, repoToIndex := range repos { | ||
repo, _ := s.RepoStore.Get(ctx, api.RepoID(repoToIndex.ID)) | ||
policyIterator := internal.NewPolicyIterator(s.PoliciesService, repoToIndex.ID, internal.SyntacticIndexing, schedulerConfig.PolicyBatchSize) | ||
err := policyIterator.ForEachPoliciesBatch(ctx, func(policies []policiesshared.ConfigurationPolicy) error { | ||
commitMap, err := s.PolicyMatcher.CommitsDescribedByPolicy(ctx, int(repoToIndex.ID), repo.Name, policies, currentTime) | ||
|
||
if err != nil { | ||
return err | ||
} | ||
|
||
for commit, policyMatches := range commitMap { | ||
if len(policyMatches) == 0 { | ||
continue | ||
} | ||
if commits := commitsToSchedule[repo.ID]; commits != nil { | ||
commits.Add(api.CommitID(commit)) | ||
} else { | ||
commitsToSchedule[repo.ID] = collections.NewSet(api.CommitID(commit)) | ||
} | ||
} | ||
|
||
return nil | ||
}) | ||
|
||
if err != nil { | ||
allErrors = errors.Append(allErrors, errors.Newf("Failed to discover commits eligible for syntactic indexing for repo [%s]: %v", repo.Name, err)) | ||
} | ||
} | ||
|
||
for repoId, commits := range commitsToSchedule { | ||
for _, commitId := range commits.Values() { | ||
if _, err := s.Enqueuer.QueueIndexingJobs(ctx, repoId, commitId, enqueueOptions); err != nil { | ||
allErrors = errors.Append(allErrors, errors.Newf("Failed to schedule syntactic indexing of repo [ID=%s], commit [%s]: %v", repoId, commitId, err)) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@varungandhi-src I've re-done this part, main highlights:
- Collecting all repos and commits once
- Removing fail fast behavior in repo iterator
- Removing fail fast behavioru in (repo, commit) iterator
Additionally the enqueuer no longer performs revision checks, we'll leave that to the syntactic worker itself.
This means the scheduler will always do best effort scheduling, while returning all accumulated errors.
I hope my understanding of return allErrors
is right here and equivalent to return nil
if no allErrors = errors.Append
calls ever happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks much better!
I hope my understanding of return allErrors is right here and equivalent to return nil if no allErrors = errors.Append calls ever happened.
Yep, we're following this pattern elsewhere too. I think you don't even need to declare the type as MultiError
, you can see other places where errors.Append(
can take the first argument as just error
instead of MultiError
. (search for var errs error
)
I will merge this PR and if there are any other serious issues to attend to, it can happen in subsequent PRs, given this component is disabled by default. |
Fixes GRAPH-124
Fixes GRAPH-121
This PR introduces three main components:
Refactoring:
TODO:
[ ] Tests for policy iterator- we're currently investigating whether policy iterator is needed at all, as pagination of policies is in question.Test plan